Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm #403

sjathin · 2021-06-05T20:43:13Z

References to other Issues or PRs or Relevant literature

Fixes #400.
This PR includes the implementation of KMP String matching algorithm.

Brief description of what is fixed or changed

The Knuth-Morris-Pratt algorithm also known as KMP is a string matching algorithm that turns the search string into a finite state machine, then runs the machine with the string to be searched as the input string. Execution time is O(m+n), where m is the length of the search string, and n is the length of the string to be searched.[1]

codecov · 2021-06-05T20:44:08Z

Codecov Report

Merging #403 (fff59fe) into master (0dd2c03) will increase coverage by 0.043%.
The diff coverage is 100.000%.

@@              Coverage Diff              @@
##            master      #403       +/-   ##
=============================================
+ Coverage   98.574%   98.618%   +0.043%     
=============================================
  Files           25        26        +1     
  Lines         3297      3401      +104     
=============================================
+ Hits          3250      3354      +104     
  Misses          47        47

Impacted Files	Coverage Δ
pydatastructs/strings/__init__.py	`100.000% <100.000%> (ø)`
pydatastructs/strings/algorithms.py	`100.000% <100.000%> (ø)`
pydatastructs/linear_data_structures/__init__.py	`100.000% <0.000%> (ø)`
pydatastructs/linear_data_structures/algorithms.py	`99.715% <0.000%> (+0.061%)`	⬆️

czgdp1807 · 2021-06-06T13:31:24Z

pydatastructs/strings/string_matching_algorithms.py

+    False
+
+    """
+    return eval(algorithm + "('" + text + "','" + pattern + "')")


Avoid using eval. Please use the pattern similar to the one shown below,

pydatastructs/pydatastructs/graphs/algorithms.py

Lines 314 to 321 in 0dd2c03

import pydatastructs.graphs.algorithms as algorithms

func = "_minimum_spanning_tree_" + algorithm + "_" + graph._impl

if not hasattr(algorithms, func):

raise NotImplementedError(

"Currently %s algoithm for %s implementation of graphs "

"isn't implemented for finding minimum spanning trees."

%(algorithm, graph._impl))

return getattr(algorithms, func)(graph)

czgdp1807 · 2021-06-06T13:32:06Z

pydatastructs/strings/string_matching_algorithms.py

+    return eval(algorithm + "('" + text + "','" + pattern + "')")
+
+
+def kmp(string: str, substring: str) -> bool:


It would be better to name it as, _knuth_morris_pratt.

The documentation is not needed here as it would be a non-public function.

czgdp1807 · 2021-06-06T13:34:36Z

pydatastructs/strings/string_matching_algorithms.py

+    'find_string'
+]
+
+def find_string(text: str, pattern: str, algorithm: str) -> bool:


The documentation for this should have the list of supported algorithms. For example,

pydatastructs/pydatastructs/graphs/algorithms.py

Lines 270 to 277 in 0dd2c03

algorithm: str

The algorithm which should be used for

computing a minimum spanning tree.

Currently the following algorithms are

supported,

'kruskal' -> Kruskal's algorithm as given in

[1].

'prim' -> Prim's algorithm as given in [2].

Full doc string of the above example is as follows,

pydatastructs/pydatastructs/graphs/algorithms.py

Lines 260 to 312 in 0dd2c03

"""

Computes a minimum spanning tree for the given

graph and algorithm.

Parameters

==========

graph: Graph

The graph whose minimum spanning tree

has to be computed.

algorithm: str

The algorithm which should be used for

computing a minimum spanning tree.

Currently the following algorithms are

supported,

'kruskal' -> Kruskal's algorithm as given in

[1].

'prim' -> Prim's algorithm as given in [2].

Returns

=======

mst: Graph

A minimum spanning tree using the implementation

same as the graph provided in the input.

Examples

========

>>> from pydatastructs import Graph, AdjacencyListGraphNode

>>> from pydatastructs import minimum_spanning_tree

>>> u = AdjacencyListGraphNode('u')

>>> v = AdjacencyListGraphNode('v')

>>> G = Graph(u, v)

>>> G.add_edge(u.name, v.name, 3)

>>> mst = minimum_spanning_tree(G, 'kruskal')

>>> u_n = mst.neighbors(u.name)

>>> mst.get_edge(u.name, u_n[0].name).value

3

References

==========

.. [1] https://en.wikipedia.org/wiki/Kruskal%27s_algorithm

.. [2] https://en.wikipedia.org/wiki/Prim%27s_algorithm

Note

====

The concept of minimum spanning tree is valid only for

connected and undirected graphs. So, this function

should be used only for such graphs. Using with other

types of graphs may lead to unwanted results.

Adding note is optional in a doc string.

czgdp1807 · 2021-06-06T13:35:05Z

pydatastructs/strings/string_matching_algorithms.py

+    return patterns
+
+
+def _doMatch(string: str, substring: str, patterns: OneDimensionalArray) -> bool:


Please follow snake case instead of camel case.

_doMatch -> _do_match. It would be better if we define this function inside _knuth_morris_pratt as for now it is called only inside its scope.

czgdp1807 · 2021-06-06T13:36:01Z

pydatastructs/strings/tests/test_string_matching_algorithms.py

+def _test_common_string_matching(algorithm):
+    true_text_pattern_dictionary = {
+        "Knuth-Morris-Pratt": "-Morris-",
+        "abcabcabcabdabcabdabcabca": "abcabdabcabca",
+        "aefcdfaecdaefaefcdaefeaefcdcdeae": "aefcdaefeaefcd",
+        "aaaaaaaa": "aaa",
+        "fullstringmatch": "fullstringmatch"
+    }
+    for test_case_key in true_text_pattern_dictionary:
+        assert find_string(test_case_key, true_text_pattern_dictionary[test_case_key], algorithm) is True
+
+    false_text_pattern_dictionary = {
+        "Knuth-Morris-Pratt": "-Pratt-",
+        "abcabcabcabdabcabdabcabca": "qwertyuiopzxcvbnm",
+        "aefcdfaecdaefaefcdaefeaefcdcdeae": "cdaefaefe",
+        "fullstringmatch": "fullstrinmatch"
+    }
+
+    for test_case_key in false_text_pattern_dictionary:
+        assert find_string(test_case_key, false_text_pattern_dictionary[test_case_key], algorithm) is False


Nice work on test cases.

czgdp1807 · 2021-06-06T13:37:48Z

pydatastructs/strings/string_matching_algorithms.py

+    return _doMatch(string, substring, patternsInSubString)
+
+
+def _buildPattern(substring: str) -> OneDimensionalArray:


Same suggestions as in _doMatch.

czgdp1807 · 2021-06-06T13:38:40Z

pydatastructs/strings/__init__.py

-from . import trie
+from . import (
+    trie,
+    string_matching_algorithms


Please rename the file as algorithms.py from string_matching_algorithms.py. We would keep all the string related algorithms in this file.

czgdp1807 · 2021-06-06T13:39:11Z

Thanks for the PR. Left some suggestions.

pydatastructs/strings/algorithms.py

czgdp1807 · 2021-10-10T14:08:04Z

Thanks @sjathin for this.

Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm

d855df4

sjathin requested a review from czgdp1807 June 5, 2021 20:43

czgdp1807 reviewed Jun 6, 2021

View reviewed changes

czgdp1807 added algorithms strings strings.algorithms labels Jun 6, 2021

czgdp1807 added 2 commits October 10, 2021 19:01

API improvements

007a1aa

Added tests

e588dad

czgdp1807 reviewed Oct 10, 2021

View reviewed changes

pydatastructs/strings/algorithms.py Outdated Show resolved Hide resolved

Updated reference to point to Wikipedia

fff59fe

czgdp1807 merged commit 7878ee4 into codezonediitj:master Oct 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm #403

Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm #403

Uh oh!

sjathin commented Jun 5, 2021

Uh oh!

codecov bot commented Jun 5, 2021 •

edited

Loading

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 Jun 6, 2021

Uh oh!

czgdp1807 commented Jun 6, 2021

Uh oh!

Uh oh!

czgdp1807 commented Oct 10, 2021

Uh oh!

Uh oh!

	import pydatastructs.graphs.algorithms as algorithms
	func = "_minimum_spanning_tree_" + algorithm + "_" + graph._impl
	if not hasattr(algorithms, func):
	raise NotImplementedError(
	"Currently %s algoithm for %s implementation of graphs "
	"isn't implemented for finding minimum spanning trees."
	%(algorithm, graph._impl))
	return getattr(algorithms, func)(graph)

		return eval(algorithm + "('" + text + "','" + pattern + "')")


		def kmp(string: str, substring: str) -> bool:

	algorithm: str
	The algorithm which should be used for
	computing a minimum spanning tree.
	Currently the following algorithms are
	supported,
	'kruskal' -> Kruskal's algorithm as given in
	[1].
	'prim' -> Prim's algorithm as given in [2].

	"""
	Computes a minimum spanning tree for the given
	graph and algorithm.

	Parameters
	==========

	graph: Graph
	The graph whose minimum spanning tree
	has to be computed.
	algorithm: str
	The algorithm which should be used for
	computing a minimum spanning tree.
	Currently the following algorithms are
	supported,
	'kruskal' -> Kruskal's algorithm as given in
	[1].
	'prim' -> Prim's algorithm as given in [2].

	Returns
	=======

	mst: Graph
	A minimum spanning tree using the implementation
	same as the graph provided in the input.

	Examples
	========

	>>> from pydatastructs import Graph, AdjacencyListGraphNode
	>>> from pydatastructs import minimum_spanning_tree
	>>> u = AdjacencyListGraphNode('u')
	>>> v = AdjacencyListGraphNode('v')
	>>> G = Graph(u, v)
	>>> G.add_edge(u.name, v.name, 3)
	>>> mst = minimum_spanning_tree(G, 'kruskal')
	>>> u_n = mst.neighbors(u.name)
	>>> mst.get_edge(u.name, u_n[0].name).value
	3

	References
	==========

	.. [1] https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
	.. [2] https://en.wikipedia.org/wiki/Prim%27s_algorithm

	Note
	====

	The concept of minimum spanning tree is valid only for
	connected and undirected graphs. So, this function
	should be used only for such graphs. Using with other
	types of graphs may lead to unwanted results.

		return patterns


		def _doMatch(string: str, substring: str, patterns: OneDimensionalArray) -> bool:

		return _doMatch(string, substring, patternsInSubString)


		def _buildPattern(substring: str) -> OneDimensionalArray:

Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm #403

Implementation of the Knuth-Morris-Pratt (KMP) string matching algorithm #403

Uh oh!

Conversation

sjathin commented Jun 5, 2021

References to other Issues or PRs or Relevant literature

Brief description of what is fixed or changed

Uh oh!

codecov bot commented Jun 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

czgdp1807 commented Jun 6, 2021

Uh oh!

Uh oh!

czgdp1807 commented Oct 10, 2021

Uh oh!

Uh oh!

codecov bot commented Jun 5, 2021 •

edited

Loading